Search | VHL Regional Portal

1.

Deep integrative models for large-scale human genomics.

Sigurdsson, Arnór I; Louloudis, Ioannis; Banasik, Karina; Westergaard, David; Winther, Ole; Lund, Ole; Ostrowski, Sisse Rye; Erikstrup, Christian; Pedersen, Ole Birger Vesterager; Nyegaard, Mette; Brunak, Søren; Vilhjálmsson, Bjarni J; Rasmussen, Simon.

Nucleic Acids Res ; 51(12): e67, 2023 07 07.

Article in English | MEDLINE | ID: mdl-37224538

ABSTRACT

Polygenic risk scores (PRSs) are expected to play a critical role in precision medicine. Currently, PRS predictors are generally based on linear models using summary statistics, and more recently individual-level data. However, these predictors mainly capture additive relationships and are limited in data modalities they can use. We developed a deep learning framework (EIR) for PRS prediction which includes a model, genome-local-net (GLN), specifically designed for large-scale genomics data. The framework supports multi-task learning, automatic integration of other clinical and biochemical data, and model explainability. When applied to individual-level data from the UK Biobank, the GLN model demonstrated a competitive performance compared to established neural network architectures, particularly for certain traits, showcasing its potential in modeling complex genetic relationships. Furthermore, the GLN model outperformed linear PRS methods for Type 1 Diabetes, likely due to modeling non-additive genetic effects and epistasis. This was supported by our identification of widespread non-additive genetic effects and epistasis in the context of T1D. Finally, we constructed PRS models that integrated genotype, blood, urine, and anthropometric data and found that this improved performance for 93% of the 290 diseases and disorders considered. EIR is available at https://github.com/arnor-sigurdsson/EIR.

Subject(s)

Models, Genetic , Multifactorial Inheritance , Polymorphism, Single Nucleotide , Humans , Genetic Predisposition to Disease , Genome, Human , Genome-Wide Association Study , Genomics/methods , Genotype , Risk Factors

2.

Chenopodium quinoa, a New Host for Alternaria Section Alternata and Alternaria Section Infectoriae Causing Yellow Leaf Blotch Disease.

Colque-Little, Carla; Lund, Ole Søgaard; Andreasen, Christian; Amby, Daniel Buchvaldt.

Plant Dis ; 107(9): 2628-2632, 2023 Sep.

Article in English | MEDLINE | ID: mdl-36880865

ABSTRACT

Quinoa (Chenopodium quinoa Willd.) is a native American crop mainly grown in the Andes of Bolivia and Peru. During the last decades, the cultivation of quinoa has expanded to more than 125 countries. Since then, several diseases of quinoa have been characterized. A leaf disease was observed on quinoa plants growing in an experimental plot in Eastern Denmark in 2018. The symptoms produced by the associated fungi consisted of small yellow blotches on the upper surface of leaves with a pale chlorotic halo surrounding the lesion. These studies used a combination of morphology, molecular diagnostics, and pathogenicity tests to identify two different Alternaria species belonging to Alternaria sections Infectoriae and Alternata as the causal agent of observed disease symptoms. To the best of our knowledge, this is the first report of Alternaria spp. as foliar pathogens of quinoa. Our findings indicate the need for additional studies to determine potential risks to quinoa production.

Subject(s)

Chenopodium quinoa , Chenopodium quinoa/microbiology , Alternaria/genetics , Peru , Plant Leaves/microbiology

3.

SourceFinder: a Machine-Learning-Based Tool for Identification of Chromosomal, Plasmid, and Bacteriophage Sequences from Assemblies.

Aytan-Aktug, Derya; Grigorjev, Vladislav; Szarvas, Judit; Clausen, Philip T L C; Munk, Patrick; Nguyen, Marcus; Davis, James J; Aarestrup, Frank M; Lund, Ole.

Microbiol Spectr ; 10(6): e0264122, 2022 12 21.

Article in English | MEDLINE | ID: mdl-36377945

ABSTRACT

High-throughput genome sequencing technologies enable the investigation of complex genetic interactions, including the horizontal gene transfer of plasmids and bacteriophages. However, identifying these elements from assembled reads remains challenging due to genome sequence plasticity and the difficulty in assembling complete sequences. In this study, we developed a classifier, using random forest, to identify whether sequences originated from bacterial chromosomes, plasmids, or bacteriophages. The classifier was trained on a diverse collection of 23,211 chromosomal, plasmid, and bacteriophage sequences from hundreds of bacterial species. In order to adapt the classifier to incomplete sequences, each complete sequence was subsampled into 5,000 nucleotide fragments and further subdivided into k-mers. This three-class classifier succeeded in identifying chromosomes, plasmids, and bacteriophages using k-mer distributions of complete and partial genome sequences, including simulated metagenomic scaffolds with minimum performance of 0.939 area under the receiver operating characteristic curve (AUC). This classifier, implemented as SourceFinder, has been made available as an online web service to help the community with predicting the chromosomal, plasmid, and bacteriophage sources of assembled bacterial sequence data (https://cge.food.dtu.dk/services/SourceFinder/). IMPORTANCE Extra-chromosomal genes encoding antimicrobial resistance, metal resistance, and virulence provide selective advantages for bacterial survival under stress conditions and pose serious threats to human and animal health. These accessory genes can impact the composition of microbiomes by providing selective advantages to their hosts. Accurately identifying extra-chromosomal elements in genome sequence data are critical for understanding gene dissemination trajectories and taking preventative measures. Therefore, in this study, we developed a random forest classifier for identifying the source of bacterial chromosomal, plasmid, and bacteriophage sequences.

Subject(s)

Bacteriophages , Genome, Bacterial , Humans , Bacteriophages/genetics , Plasmids/genetics , Chromosomes, Bacterial/genetics , Machine Learning

4.

Metagenomic DNA sequencing for semi-quantitative pathogen detection from urine: a prospective, laboratory-based, proof-of-concept study.

Janes, Victoria A; Matamoros, Sébastien; Munk, Patrick; Clausen, Philip T L C; Koekkoek, Sylvie M; Koster, Linda A M; Jakobs, Marja E; de Wever, Bob; Visser, Caroline E; Aarestrup, Frank M; Lund, Ole; de Jong, Menno D; Bossuyt, Patrick M M; Mende, Daniel R; Schultsz, Constance.

Lancet Microbe ; 3(8): e588-e597, 2022 08.

Article in English | MEDLINE | ID: mdl-35688170

ABSTRACT

BACKGROUND: Semi-quantitative bacterial culture is the reference standard to diagnose urinary tract infection, but culture is time-consuming and can be unreliable if patients are receiving antibiotics. Metagenomics could increase diagnostic accuracy and speed by sequencing the microbiota and resistome directly from urine. We aimed to compare metagenomics to culture for semi-quantitative pathogen and resistome detection from urine. METHODS: In this proof-of-concept study, we prospectively included consecutive urine samples from a clinical diagnostic laboratory in Amsterdam. Urine samples were screened by DNA concentration, followed by PCR-free metagenomic sequencing of randomly selected samples with a high concentration of DNA (culture positive and negative). A diagnostic index was calculated as the product of DNA concentration and fraction of pathogen reads. We compared results with semi-quantitative culture using area under the receiver operating characteristic curve (AUROC) analyses. We used ResFinder and PointFinder for resistance gene detection and compared results to phenotypic antimicrobial susceptibility testing for six antibiotics commonly used for urinary tract infection treatment: nitrofurantoin, ciprofloxacin, fosfomycin, cotrimoxazole, ceftazidime, and ceftriaxone. FINDINGS: We screened 529 urine samples of which 86 were sequenced (43 culture positive and 43 culture negative). The AUROC of the DNA concentration-based screening was 0·85 (95% CI 0·81-0·89). At a cutoff value of 6·0 ng/mL, culture positivity was ruled out with a negative predictive value of 91% (95% CI 87-93; 26 of 297 samples), reducing the number of samples requiring sequencing by 56% (297 of 529 samples). The AUROC of the diagnostic index was 0·87 (95% CI 0·79-0·95). A diagnostic index cutoff value of 17·2 yielded a positive predictive value of 93% (95% CI 85-97) and a negative predictive value of 69% (55-80), correcting for a culture-positive prevalence of 66%. Gram-positive pathogens explained eight (89%) of the nine false-negative metagenomic test results. Agreement of phenotypic and genotypic antimicrobial susceptibility testing varied between 71% (22 of 31 samples) and 100% (six of six samples), depending on the antibiotic tested. INTERPRETATION: This study provides proof-of-concept of metagenomic semi-quantitative pathogen and resistome detection for the diagnosis of urinary tract infection. The findings warrant prospective clinical validation of the value of this approach in informing patient management and care. FUNDING: EU Horizon 2020 Research and Innovation Programme.

Subject(s)

Metagenomics , Urinary Tract Infections , Anti-Bacterial Agents/pharmacology , Humans , Metagenomics/methods , Prospective Studies , Sequence Analysis, DNA , Urinary Tract Infections/diagnosis

5.

PlasmidHostFinder: Prediction of Plasmid Hosts Using Random Forest.

Aytan-Aktug, Derya; Clausen, Philip T L C; Szarvas, Judit; Munk, Patrick; Otani, Saria; Nguyen, Marcus; Davis, James J; Lund, Ole; Aarestrup, Frank M.

mSystems ; 7(2): e0118021, 2022 04 26.

Article in English | MEDLINE | ID: mdl-35382558

ABSTRACT

Plasmids play a major role facilitating the spread of antimicrobial resistance between bacteria. Understanding the host range and dissemination trajectories of plasmids is critical for surveillance and prevention of antimicrobial resistance. Identification of plasmid host ranges could be improved using automated pattern detection methods compared to homology-based methods due to the diversity and genetic plasticity of plasmids. In this study, we developed a method for predicting the host range of plasmids using machine learning-specifically, random forests. We trained the models with 8,519 plasmids from 359 different bacterial species per taxonomic level; the models achieved Matthews correlation coefficients of 0.662 and 0.867 at the species and order levels, respectively. Our results suggest that despite the diverse nature and genetic plasticity of plasmids, our random forest model can accurately distinguish between plasmid hosts. This tool is available online through the Center for Genomic Epidemiology (https://cge.cbs.dtu.dk/services/PlasmidHostFinder/). IMPORTANCE Antimicrobial resistance is a global health threat to humans and animals, causing high mortality and morbidity while effectively ending decades of success in fighting against bacterial infections. Plasmids confer extra genetic capabilities to the host organisms through accessory genes that can encode antimicrobial resistance and virulence. In addition to lateral inheritance, plasmids can be transferred horizontally between bacterial taxa. Therefore, detection of the host range of plasmids is crucial for understanding and predicting the dissemination trajectories of extrachromosomal genes and bacterial evolution as well as taking effective countermeasures against antimicrobial resistance.

Subject(s)

Anti-Infective Agents , Random Forest , Animals , Humans , Plasmids , Bacteria/genetics , Genomics

6.

Identification of Single-Nucleotide Polymorphisms in the Mitochondrial Genome and Kelch 13 Gene of Plasmodium falciparum in Different Geographical Populations.

Nydahl, Tine Kliim; Ahorhorlu, Samuel Yao; Ndiaye, Magatte; Das, Manoj Kumar; Hansson, Helle; Bravo, Marina Crespo; Wang, Christian William; Lusingu, John; Theisen, Michael; Singh, Susheel Kumar; Singh, Subhash; Campino, Susana; Lund, Ole; Roper, Cally; Alifrangis, Michael.

Am J Trop Med Hyg ; 105(4): 1085-1092, 2021 07 16.

Article in English | MEDLINE | ID: mdl-34270452

ABSTRACT

The emergence of artemisinin-resistant Plasmodium falciparum parasites in Southeast Asia threatens malaria control and elimination. The interconnectedness of parasite populations may be essential to monitor the spread of resistance. Combining a published barcoding system of geographically restricted single-nucleotide polymorphisms (SNPs), mainly mitochondria of P. falciparum with SNPs in the K13 artemisinin resistance marker, could elucidate the parasite population structure and provide insight regarding the spread of drug resistance. We explored the diversity of mitochondrial SNPs (bp position 611-2825) and identified K13 SNPs from malaria patients in the districts of India (Ranchi), Tanzania (Korogwe), and Senegal (Podor, Richard Toll, Kaolack, and Ndoffane). DNA was amplified using a nested PCR and Sanger-sequenced. Overall, 199 K13 sequences (India: N = 92; Tanzania: N = 48; Senegal: N = 59) and 237 mitochondrial sequences (India: N = 93; Tanzania: N = 48; Senegal: N = 96) were generated. SNPs were identified by comparisons with reference genomes. We detected previously reported geographically restricted mitochondrial SNPs (T2175C and G1367A) as markers for parasites originating from the Indian subcontinent and several geographically unrestricted mitochondrial SNPs. Combining haplotypes with published P. falciparum mitochondrial genome data suggested possible regional differences within India. All three countries had G1692A, but Tanzanian and Senegalese SNPs were well-differentiated. Some mitochondrial SNPs are reported here for the first time. Four nonsynonymous K13 SNPs were detected: K189T (India, Tanzania, Senegal); A175T (Tanzania); and A174V and R255K (Senegal). This study supports the use of mitochondrial SNPs to determine the origin of the parasite and suggests that the P. falciparum populations studied were susceptible to artemisinin during sampling because all K13 SNPs observed were outside the propeller domain for artemisinin resistance.

Subject(s)

DNA, Protozoan/genetics , Genome, Mitochondrial , Plasmodium falciparum/genetics , Polymorphism, Single Nucleotide , Haplotypes , Humans , India/epidemiology , Malaria, Falciparum/epidemiology , Malaria, Falciparum/parasitology

7.

MINTyper: an outbreak-detection method for accurate and rapid SNP typing of clonal clusters with noisy long reads.

Hallgren, Malte B; Overballe-Petersen, Søren; Lund, Ole; Hasman, Henrik; Clausen, Philip T L C.

Biol Methods Protoc ; 6(1): bpab008, 2021.

Article in English | MEDLINE | ID: mdl-33981853

ABSTRACT

For detection of clonal outbreaks in clinical settings, we present a complete pipeline that generates a single-nucleotide polymorphisms-distance matrix from a set of sequencing reads. Importantly, the program is able to handle a separate mix of both short reads from the Illumina sequencing platforms and long reads from Oxford Nanopore Technologies' (ONT) platforms as input. MINTyper performs automated reference identification, alignment, alignment trimming, optional methylation masking, and pairwise distance calculations. With this approach, we could rapidly and accurately cluster a set of DNA sequenced isolates, with a known epidemiological relationship to confirm the clustering. Functions were built to allow for both high-accuracy methylation-aware base-called MinION reads (hac_m Q10) and fast generated lower-quality reads (fast Q8) to be used, also in combination with Illumina data. With fast Q8 reads a higher number of base pairs were excluded from the calculated distance matrix, compared with the high-accuracy methylation-aware Q10 base-calling of ONT data. Nonetheless, when using different qualities of ONT data with corresponding input parameters, the clustering of isolates were nearly identical.

8.

Rapid Open-Source SNP-Based Clustering Offers an Alternative to Core Genome MLST for Outbreak Tracing in a Hospital Setting.

Szarvas, Judit; Bartels, Mette Damkjaer; Westh, Henrik; Lund, Ole.

Front Microbiol ; 12: 636608, 2021.

Article in English | MEDLINE | ID: mdl-33868194

ABSTRACT

Traditional genotyping methods for infection control of antimicrobial-resistant bacteria in healthcare settings have been supplemented by whole-genome sequencing (WGS), often relying on a gene-based approach, e.g., core genome multilocus sequence typing (cgMLST), to cluster-related samples. In this study, we compared clusters of methicillin-resistant Staphylococcus aureus (MRSA) and Enterococcus faecium analyzed with the commercial cgMLST software Ridom SeqSphere+ and with an open-source single-nucleotide polymorphism (SNP)-based phylogenetic analysis pipeline (PAPABAC). A total of 5,655 MRSA and 2,572 E. faecium patient isolates, collected between 2013 and 2018, were processed. Clusters of 1,844 MRSA and 1,355 E. faecium isolates were compared to cgMLST results, and epidemiological data were included when available. The phylogenies inferred by the two different technologies were highly concordant, and the MRSA SNP tree re-captured known hospital-related outbreaks and epidemiologically linked samples. PAPABAC has the advantage over Ridom SeqSphere+ to generate stable, referable clusters without the need for sequence assembly, and it is a free-of-charge, open-source alternative to the commercial software.

9.

Machine learning predicts and provides insights into milk acidification rates of Lactococcus lactis.

Karlsen, Signe Tang; Vesth, Tammi Camilla; Oregaard, Gunnar; Poulsen, Vera Kuzina; Lund, Ole; Henderson, Gemma; Bælum, Jacob.

PLoS One ; 16(3): e0246287, 2021.

Article in English | MEDLINE | ID: mdl-33720959

ABSTRACT

Lactococcus lactis strains are important components in industrial starter cultures for cheese manufacturing. They have many strain-dependent properties, which affect the final product. Here, we explored the use of machine learning to create systematic, high-throughput screening methods for these properties. Fast acidification of milk is such a strain-dependent property. To predict the maximum hourly acidification rate (Vmax), we trained Random Forest (RF) models on four different genomic representations: Presence/absence of gene families, counts of Pfam domains, the 8 nucleotide long subsequences of their DNA (8-mers), and the 9 nucleotide long subsequences of their DNA (9-mers). Vmax was measured at different temperatures, volumes, and in the presence or absence of yeast extract. These conditions were added as features in each RF model. The four models were trained on 257 strains, and the correlation between the measured Vmax and the predicted Vmax was evaluated with Pearson Correlation Coefficients (PC) on a separate dataset of 85 strains. The models all had high PC scores: 0.83 (gene presence/absence model), 0.84 (Pfam domain model), 0.76 (8-mer model), and 0.85 (9-mer model). The models all based their predictions on relevant genetic features and showed consensus on systems for lactose metabolism, degradation of casein, and pH stress response. Each model also predicted a set of features not found by the other models.

Subject(s)

High-Throughput Screening Assays/methods , Lactococcus lactis/physiology , Milk/chemistry , Animals , Computer Simulation , Food Microbiology , Genome, Bacterial , Hydrogen-Ion Concentration , Lactococcus lactis/genetics , Machine Learning , Milk/microbiology , Models, Theoretical , Whole Genome Sequencing

10.

Genetic variation for tolerance to the downy mildew pathogen Peronospora variabilis in genetic resources of quinoa (Chenopodium quinoa).

Colque-Little, Carla; Abondano, Miguel Correa; Lund, Ole Søgaard; Amby, Daniel Buchvaldt; Piepho, Hans-Peter; Andreasen, Christian; Schmöckel, Sandra; Schmid, Karl.

BMC Plant Biol ; 21(1): 41, 2021 Jan 14.

Article in English | MEDLINE | ID: mdl-33446098

ABSTRACT

BACKGROUND: Quinoa (Chenopodium quinoa Willd.) is an ancient grain crop that is tolerant to abiotic stress and has favorable nutritional properties. Downy mildew is the main disease of quinoa and is caused by infections of the biotrophic oomycete Peronospora variabilis Gaüm. Since the disease causes major yield losses, identifying sources of downy mildew tolerance in genetic resources and understanding its genetic basis are important goals in quinoa breeding. RESULTS: We infected 132 South American genotypes, three Danish cultivars and the weedy relative C. album with a single isolate of P. variabilis under greenhouse conditions and observed a large variation in disease traits like severity of infection, which ranged from 5 to 83%. Linear mixed models revealed a significant effect of genotypes on disease traits with high heritabilities (0.72 to 0.81). Factors like altitude at site of origin or seed saponin content did not correlate with mildew tolerance, but stomatal width was weakly correlated with severity of infection. Despite the strong genotypic effects on mildew tolerance, genome-wide association mapping with 88 genotypes failed to identify significant marker-trait associations indicating a polygenic architecture of mildew tolerance. CONCLUSIONS: The strong genetic effects on mildew tolerance allow to identify genetic resources, which are valuable sources of resistance in future quinoa breeding.

Subject(s)

Chenopodium quinoa/genetics , Chenopodium quinoa/microbiology , Genetic Variation , Peronospora/pathogenicity , Plant Diseases/microbiology , Chenopodium album/microbiology , Genome, Plant , Genome-Wide Association Study , Genotype , Host-Pathogen Interactions/genetics , Linear Models , Peronospora/isolation & purification , Plant Diseases/etiology , Plant Diseases/genetics , Saponins/analysis , Seeds/chemistry , South America , Whole Genome Sequencing

11.

Automated download and clean-up of family-specific databases for kmer-based virus identification.

Allesøe, Rosa L; Lemvigh, Camilla K; Phan, My V T; Clausen, Philip T L C; Florensa, Alfred F; Koopmans, Marion P G; Lund, Ole; Cotten, Matthew.

Bioinformatics ; 37(5): 705-710, 2021 05 05.

Article in English | MEDLINE | ID: mdl-33031509

ABSTRACT

SUMMARY: Here, we present an automated pipeline for Download Of NCBI Entries (DONE) and continuous updating of a local sequence database based on user-specified queries. The database can be created with either protein or nucleotide sequences containing all entries or complete genomes only. The pipeline can automatically clean the database by removing entries with matches to a database of user-specified sequence contaminants. The default contamination entries include sequences from the UniVec database of plasmids, marker genes and sequencing adapters from NCBI, an E.coli genome, rRNA sequences, vectors and satellite sequences. Furthermore, duplicates are removed and the database is automatically screened for sequences from green fluorescent protein, luciferase and antibiotic resistance genes that might be present in some GenBank viral entries, and could lead to false positives in virus identification. For utilizing the database, we present a useful opportunity for dealing with possible human contamination. We show the applicability of DONE by downloading a virus database comprising 37 virus families. We observed an average increase of 16 776 new entries downloaded per month for the 37 families. In addition, we demonstrate the utility of a custom database compared to a standard reference database for classifying both simulated and real sequence data. AVAILABILITYAND IMPLEMENTATION: The DONE pipeline for downloading and cleaning is deposited in a publicly available repository (https://bitbucket.org/genomicepidemiology/done/src/master/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Databases, Genetic , Databases, Nucleic Acid , Genome , Humans , Proteins

12.

AnOxPePred: using deep learning for the prediction of antioxidative properties of peptides.

Olsen, Tobias Hegelund; Yesiltas, Betül; Marin, Frederikke Isa; Pertseva, Margarita; García-Moreno, Pedro J; Gregersen, Simon; Overgaard, Michael Toft; Jacobsen, Charlotte; Lund, Ole; Hansen, Egon Bech; Marcatili, Paolo.

Sci Rep ; 10(1): 21471, 2020 12 08.

Article in English | MEDLINE | ID: mdl-33293615

ABSTRACT

Dietary antioxidants are an important preservative in food and have been suggested to help in disease prevention. With consumer demands for less synthetic and safer additives in food products, the food industry is searching for antioxidants that can be marketed as natural. Peptides derived from natural proteins show promise, as they are generally regarded as safe and potentially contain other beneficial bioactivities. Antioxidative peptides are usually obtained by testing various peptides derived from hydrolysis of proteins by a selection of proteases. This slow and cumbersome trial-and-error approach to identify antioxidative peptides has increased interest in developing computational approaches for prediction of antioxidant activity and thereby reduce laboratory work. A few antioxidant predictors exist, however, no tool predicting the antioxidative properties of peptides is, to the best of our knowledge, currently available as a web-server. We here present the AnOxPePred tool and web-server ( http://services.bioinformatics.dtu.dk/service.php?AnOxPePred-1.0 ) that uses deep learning to predict the antioxidant properties of peptides. Our model was trained on a curated dataset consisting of experimentally-tested antioxidant and non-antioxidant peptides. For a variety of metrics our method displays a prediction performance better than a k-NN sequence identity-based approach. Furthermore, the developed tool will be a good benchmark for future predictors of antioxidant peptides.

Subject(s)

Antioxidants/chemistry , Deep Learning , Food Preservatives/chemistry , Peptides/chemistry , Amino Acid Sequence , Antioxidants/pharmacology , Food Preservatives/pharmacology , Humans , Peptides/pharmacology , Software

13.

Understanding and predicting ciprofloxacin minimum inhibitory concentration in Escherichia coli with machine learning.

Pataki, Bálint Ármin; Matamoros, Sébastien; van der Putten, Boas C L; Remondini, Daniel; Giampieri, Enrico; Aytan-Aktug, Derya; Hendriksen, Rene S; Lund, Ole; Csabai, István; Schultsz, Constance.

Sci Rep ; 10(1): 15026, 2020 09 14.

Article in English | MEDLINE | ID: mdl-32929164

ABSTRACT

It is important that antibiotics prescriptions are based on antimicrobial susceptibility data to ensure effective treatment outcomes. The increasing availability of next-generation sequencing, bacterial whole genome sequencing (WGS) can facilitate a more reliable and faster alternative to traditional phenotyping for the detection and surveillance of AMR. This work proposes a machine learning approach that can predict the minimum inhibitory concentration (MIC) for a given antibiotic, here ciprofloxacin, on the basis of both genome-wide mutation profiles and profiles of acquired antimicrobial resistance genes. We analysed 704 Escherichia coli genomes combined with their respective MIC measurements for ciprofloxacin originating from different countries. The four most important predictors found by the model, mutations in gyrA residues Ser83 and Asp87, a mutation in parC residue Ser80 and presence of the qnrS1 gene, have been experimentally validated before. Using only these four predictors in a linear regression model, 65% and 93% of the test samples' MIC were correctly predicted within a two- and a four-fold dilution range, respectively. The presented work does not treat machine learning as a black box model concept, but also identifies the genomic features that determine susceptibility. The recent progress in WGS technology in combination with machine learning analysis approaches indicates that in the near future WGS of bacteria might become cheaper and faster than a MIC measurement.

Subject(s)

Anti-Bacterial Agents/toxicity , Ciprofloxacin/toxicity , Drug Resistance, Bacterial , Genes, Bacterial , Machine Learning , DNA Gyrase/genetics , Escherichia coli/drug effects , Escherichia coli/genetics , Escherichia coli Proteins/genetics , Inhibitory Concentration 50 , Intracellular Signaling Peptides and Proteins/genetics , Mutation , Toxicity Tests/methods

14.

ResFinder 4.0 for predictions of phenotypes from genotypes.

Bortolaia, Valeria; Kaas, Rolf S; Ruppe, Etienne; Roberts, Marilyn C; Schwarz, Stefan; Cattoir, Vincent; Philippon, Alain; Allesoe, Rosa L; Rebelo, Ana Rita; Florensa, Alfred Ferrer; Fagelhauer, Linda; Chakraborty, Trinad; Neumann, Bernd; Werner, Guido; Bender, Jennifer K; Stingl, Kerstin; Nguyen, Minh; Coppens, Jasmine; Xavier, Basil Britto; Malhotra-Kumar, Surbhi; Westh, Henrik; Pinholt, Mette; Anjum, Muna F; Duggett, Nicholas A; Kempf, Isabelle; Nykäsenoja, Suvi; Olkkola, Satu; Wieczorek, Kinga; Amaro, Ana; Clemente, Lurdes; Mossong, Joël; Losch, Serge; Ragimbeau, Catherine; Lund, Ole; Aarestrup, Frank M.

J Antimicrob Chemother ; 75(12): 3491-3500, 2020 12 01.

Article in English | MEDLINE | ID: mdl-32780112

ABSTRACT

OBJECTIVES: WGS-based antimicrobial susceptibility testing (AST) is as reliable as phenotypic AST for several antimicrobial/bacterial species combinations. However, routine use of WGS-based AST is hindered by the need for bioinformatics skills and knowledge of antimicrobial resistance (AMR) determinants to operate the vast majority of tools developed to date. By leveraging on ResFinder and PointFinder, two freely accessible tools that can also assist users without bioinformatics skills, we aimed at increasing their speed and providing an easily interpretable antibiogram as output. METHODS: The ResFinder code was re-written to process raw reads and use Kmer-based alignment. The existing ResFinder and PointFinder databases were revised and expanded. Additional databases were developed including a genotype-to-phenotype key associating each AMR determinant with a phenotype at the antimicrobial compound level, and species-specific panels for in silico antibiograms. ResFinder 4.0 was validated using Escherichia coli (n = 584), Salmonella spp. (n = 1081), Campylobacter jejuni (n = 239), Enterococcus faecium (n = 106), Enterococcus faecalis (n = 50) and Staphylococcus aureus (n = 163) exhibiting different AST profiles, and from different human and animal sources and geographical origins. RESULTS: Genotype-phenotype concordance was ≥95% for 46/51 and 25/32 of the antimicrobial/species combinations evaluated for Gram-negative and Gram-positive bacteria, respectively. When genotype-phenotype concordance was <95%, discrepancies were mainly linked to criteria for interpretation of phenotypic tests and suboptimal sequence quality, and not to ResFinder 4.0 performance. CONCLUSIONS: WGS-based AST using ResFinder 4.0 provides in silico antibiograms as reliable as those obtained by phenotypic AST at least for the bacterial species/antimicrobial agents of major public health relevance considered.

Subject(s)

Anti-Bacterial Agents , Drug Resistance, Bacterial , Animals , Anti-Bacterial Agents/pharmacology , Genotype , Humans , Microbial Sensitivity Tests , Phenotype

15.

In Silico Genotyping of Escherichia coli Isolates for Extraintestinal Virulence Genes by Use of Whole-Genome Sequencing Data.

Malberg Tetzschner, Anna Maria; Johnson, James R; Johnston, Brian D; Lund, Ole; Scheutz, Flemming.

J Clin Microbiol ; 58(10)2020 09 22.

Article in English | MEDLINE | ID: mdl-32669379

ABSTRACT

Extraintestinal pathogenic Escherichia coli (ExPEC) is the leading cause in humans of urinary tract infection and bacteremia. The previously published web tool VirulenceFinder (http://cge.cbs.dtu.dk/services/VirulenceFinder/) uses whole-genome sequencing (WGS) data for in silico characterization of E. coli isolates and enables researchers and clinical health personnel to quickly extract and interpret virulence-relevant information from WGS data. In this study, 38 ExPEC-associated virulence genes were added to the existing E. coli VirulenceFinder database. In total, 14,441 alleles were downloaded. A total of 1,890 distinct alleles were added to the database after removal of redundant sequences and analysis of the remaining alleles for open reading frames (ORFs). The database now contains 139 genes-of which 44 are related to ExPEC-and 2,826 corresponding alleles. Construction of the database included validation against 27 primer pairs from previous studies, a search for serotype-specific P fimbriae papA alleles, and a BLASTn confirmation of seven genes (etsC, iucC, kpsE, neuC, sitA, tcpC, and terC) not covered by the primers. The augmented database was evaluated using (i) a panel of nine control strains and (ii) 288 human-source E. coli strains classified by PCR as ExPEC and non-ExPEC. We observed very high concordance (average, 93.4%) between PCR and WGS findings, but WGS identified more alleles. In conclusion, the addition of 38 ExPEC-associated genes and the associated alleles to the E. coli VirulenceFinder database allows for a more complete characterization of E. coli isolates based on WGS data, which has become increasingly important considering the plasticity of the E. coli genome.

Subject(s)

Escherichia coli Infections , Escherichia coli Proteins , Computer Simulation , Escherichia coli/genetics , Escherichia coli Proteins/genetics , Genotype , Humans , Membrane Transport Proteins , Phylogeny , Virulence/genetics , Virulence Factors/genetics

16.

An interactive database for the investigation of high-density peptide microarray guided interaction patterns and antivenom cross-reactivity.

Krause, Kamille E; Jenkins, Timothy P; Skaarup, Carina; Engmark, Mikael; Casewell, Nicholas R; Ainsworth, Stuart; Lomonte, Bruno; Fernández, Julián; Gutiérrez, José M; Lund, Ole; Laustsen, Andreas H.

PLoS Negl Trop Dis ; 14(6): e0008366, 2020 06.

Article in English | MEDLINE | ID: mdl-32579606

ABSTRACT

Snakebite envenoming is a major neglected tropical disease that affects millions of people every year. The only effective treatment against snakebite envenoming consists of unspecified cocktails of polyclonal antibodies purified from the plasma of immunized production animals. Currently, little data exists on the molecular interactions between venom-toxin epitopes and antivenom-antibody paratopes. To address this issue, high-density peptide microarray (hdpm) technology has recently been adapted to the field of toxinology. However, analysis of such valuable datasets requires expert understanding and, thus, complicates its broad application within the field. In the present study, we developed a user-friendly, and high-throughput web application named "Snake Toxin and Antivenom Binding Profiles" (STAB Profiles), to allow straight-forward analysis of hdpm datasets. To test our tool and evaluate its performance with a large dataset, we conducted hdpm assays using all African snake toxin protein sequences available in the UniProt database at the time of study design, together with eight commercial antivenoms in clinical use in Africa, thus representing the largest venom-antivenom dataset to date. Furthermore, we introduced a novel method for evaluating raw signals from a peptide microarray experiment and a data normalization protocol enabling intra-microarray and even inter-microarray chip comparisons. Finally, these data, alongside all the data from previous similar studies by Engmark et al., were preprocessed according to our newly developed protocol and made publicly available for download through the STAB Profiles web application (http://tropicalpharmacology.com/tools/stab-profiles/). With these data and our tool, we were able to gain key insights into toxin-antivenom interactions and were able to differentiate the ability of different antivenoms to interact with certain toxins of interest. The data, as well as the web application, we present in this article should be of significant value to the venom-antivenom research community. Knowledge gained from our current and future analyses of this dataset carry the potential to guide the improvement and optimization of current antivenoms for maximum patient benefit, as well as aid the development of next-generation antivenoms.

Subject(s)

Antivenins/pharmacology , Cross Reactions , Data Management , Peptides , Protein Array Analysis/methods , Africa , Animals , Binding Sites , Epitopes/chemistry , Humans , Snake Bites/therapy , Snake Venoms/chemistry , Snakes/classification , Snakes/metabolism

17.

CCMetagen: comprehensive and accurate identification of eukaryotes and prokaryotes in metagenomic data.

Marcelino, Vanessa R; Clausen, Philip T L C; Buchmann, Jan P; Wille, Michelle; Iredell, Jonathan R; Meyer, Wieland; Lund, Ole; Sorrell, Tania C; Holmes, Edward C.

Genome Biol ; 21(1): 103, 2020 04 28.

Article in English | MEDLINE | ID: mdl-32345331

ABSTRACT

There is an increasing demand for accurate and fast metagenome classifiers that can not only identify bacteria, but all members of a microbial community. We used a recently developed concept in read mapping to develop a highly accurate metagenomic classification pipeline named CCMetagen. The pipeline substantially outperforms other commonly used software in identifying bacteria and fungi and can efficiently use the entire NCBI nucleotide collection as a reference to detect species with incomplete genome data from all biological kingdoms. CCMetagen is user-friendly, and the results can be easily integrated into microbial community analysis software for streamlined and automated microbiome studies.

Subject(s)

Bacteria/classification , Eukaryota/classification , Fungi/classification , Metagenomics/methods , Software , Animals , Archaea/classification , Archaea/genetics , Bacteria/genetics , Birds/microbiology , Eukaryota/genetics , Fungi/genetics , Gene Expression Profiling

18.

Accelerating surveillance and research of antimicrobial resistance - an online repository for sharing of antimicrobial susceptibility data associated with whole-genome sequences.

Matamoros, Sébastien; Hendriksen, Rene S; Pataki, Bálint Ármin; Pakseresht, Nima; Rossello, Marc; Silvester, Nicole; Amid, Clara; Aarestrup, Frank M; Koopmans, Marion; Cochrane, Guy; Csabai, Istvan; Lund, Ole; Schultsz, Constance.

Microb Genom ; 6(5)2020 05.

Article in English | MEDLINE | ID: mdl-32255760

ABSTRACT

Antimicrobial resistance (AMR) is an emerging threat to modern medicine. Improved diagnostics and surveillance of resistant bacteria require the development of next-generation analysis tools and collaboration between international partners. Here, we present the 'AMR Data Hub', an online infrastructure for storage and sharing of structured phenotypic AMR data linked to bacterial whole-genome sequences. Leveraging infrastructure built by the European COMPARE Consortium and structured around the European Nucleotide Archive (ENA), the AMR Data Hub already provides an extensive data collection of more than 2500 isolates with linked genome and AMR data. Representing these data in standardized formats, we provide tools for the validation and submission of new data and services supporting search, browse and retrieval. The current collection was created through a collaboration by several partners from the European COMPARE Consortium, demonstrating the capacities and utility of the AMR Data Hub and its associated tools. We anticipate growth of content and offer the hub as a basis for future research into methods to explore and predict AMR.

Subject(s)

Anti-Bacterial Agents/pharmacology , Bacteria/genetics , Drug Resistance, Bacterial , Whole Genome Sequencing/methods , Bacteria/drug effects , Databases, Genetic , High-Throughput Nucleotide Sequencing , Internet , Phenotype

19.

Large scale automated phylogenomic analysis of bacterial isolates and the Evergreen Online platform.

Szarvas, Judit; Ahrenfeldt, Johanne; Cisneros, Jose Luis Bellod; Thomsen, Martin Christen Frølund; Aarestrup, Frank M; Lund, Ole.

Commun Biol ; 3(1): 137, 2020 03 20.

Article in English | MEDLINE | ID: mdl-32198478

ABSTRACT

Public health authorities whole-genome sequence thousands of isolates each month for microbial diagnostics and surveillance of pathogenic bacteria. The computational methods have not kept up with the deluge of data and the need for real-time results. We have therefore created a bioinformatics pipeline for rapid subtyping and continuous phylogenomic analysis of bacterial samples, suited for large-scale surveillance. The data is divided into sets by mapping to reference genomes, then consensus sequences are generated. Nucleotide based genetic distance is calculated between the sequences in each set, and isolates are clustered together at 10 single-nucleotide polymorphisms. Phylogenetic trees are inferred from the non-redundant sequences and the clustered isolates are added back. The method is accurate at grouping outbreak strains together, while discriminating them from non-outbreak strains. The pipeline is applied in Evergreen Online, which processes publicly available sequencing data from foodborne bacterial pathogens on a daily basis, updating phylogenetic trees as needed.

Subject(s)

Bacteria/genetics , Computational Biology , DNA, Bacterial/genetics , Environmental Monitoring , Foodborne Diseases/microbiology , Online Systems , Phylogeny , Polymorphism, Single Nucleotide , Whole Genome Sequencing , Automation, Laboratory , Bacteria/classification , Bacteria/isolation & purification , Bacteria/pathogenicity , DNA, Bacterial/isolation & purification , Workflow

20.

Metaphylogenetic analysis of global sewage reveals that bacterial strains associated with human disease show less degree of geographic clustering.

Ahrenfeldt, Johanne; Waisi, Madina; Loft, Isabella C; Clausen, Philip T L C; Allesøe, Rosa; Szarvas, Judit; Hendriksen, Rene S; Aarestrup, Frank M; Lund, Ole.

Sci Rep ; 10(1): 3033, 2020 02 20.

Article in English | MEDLINE | ID: mdl-32080241

ABSTRACT

Knowledge about the difference in the global distribution of pathogens and non-pathogens is limited. Here, we investigate it using a multi-sample metagenomics phylogeny approach based on short-read metagenomic sequencing of sewage from 79 sites around the world. For each metagenomic sample, bacterial template genomes were identified in a non-redundant database of whole genome sequences. Reads were mapped to the templates identified in each sample. Phylogenetic trees were constructed for each template identified in multiple samples. The countries from which the samples were taken were grouped according to different definitions of world regions. For each tree, the tendency for regional clustering was determined. Phylogenetic trees representing 95 unique bacterial templates were created covering 4 to 71 samples. Varying degrees of regional clustering could be observed. The clustering was most pronounced for environmental bacterial species and human commensals, and less for colonizing opportunistic pathogens, opportunistic pathogens and pathogens. No pattern of significant difference in clustering between any of the organism classifications and country groupings according to income were observed. Our study suggests that while the same bacterial species might be found globally, there is a geographical regional selection or barrier to spread for individual clones of environmental and human commensal bacteria, whereas this is to a lesser degree the case for strains and clones of human pathogens and opportunistic pathogens.

Subject(s)

Bacteria/classification , Disease , Geography , Metagenomics , Phylogeny , Sewage/microbiology , Bacteria/genetics , Cluster Analysis , Databases, Genetic , Genome, Bacterial , Humans , Templates, Genetic

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL